Search CORE

9 research outputs found

Leveraging contextual embeddings and self-attention neural networks with bi-attention for sentiment analysis

Author: Biesialska Katarzyna
Biesialska Magdalena Marta
Rybinski Henryk
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2021
Field of study

People express their opinions and views in different and often ambiguous ways, hence the meaning of their words is often not explicitly stated and frequently depends on the context. Therefore, it is difficult for machines to process and understand the information conveyed in human languages. This work addresses the problem of sentiment analysis (SA). We propose a simple yet comprehensive method which uses contextual embeddings and a self-attention mechanism to detect and classify sentiment. We perform experiments on reviews from different domains, as well as on languages from three different language families, including morphologically rich Polish and German. We show that our approach is on a par with state-of-the-art models or even outperforms them in several cases. Our work also demonstrates the superiority of models leveraging contextual embeddings. In sum, in this paper we make a step towards building a universal, multilingual sentiment classifier.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Enhancing Word Embeddings with Knowledge Extracted from Lexical Resources

Author: Biesialska Magdalena
Costa-jussà Marta R.
Rafieian Bardia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Refinement of Unsupervised Cross-Lingual Word Embeddings

Author: Biesialska Magdalena
Costa-jussà Marta R.
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages by allowing to learn multilingual word representations even without using any direct bilingual signal. The lion's share of the methods are projection-based approaches that map pre-trained embeddings into a shared latent space. These methods are mostly based on the orthogonal transformation, which assumes language vector spaces to be isomorphic. However, this criterion does not necessarily hold, especially for morphologically-rich languages. In this paper, we propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings. The proposed model moves vectors of words and their corresponding translations closer to each other as well as enforces length- and center-invariance, thus allowing to better align cross-lingual embeddings. The experimental results demonstrate the effectiveness of our approach, as in most cases it outperforms state-of-the-art methods in a bilingual lexicon induction task.Comment: Accepted at the 24th European Conference on Artificial Intelligence (ECAI 2020

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

The TALP-UPC System for the WMT Similar Language Task: Statistical vs Neural Machine Translation

Author: Biesialska Magdalena
Costa-jussà Marta R.
Guardia Lluis
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Although the problem of similar language translation has been an area of research interest for many years, yet it is still far from being solved. In this paper, we study the performance of two popular approaches: statistical and neural. We conclude that both methods yield similar results; however, the performance varies depending on the language pair. While the statistical approach outperforms the neural one by a difference of 6 BLEU points for the Spanish-Portuguese language pair, the proposed neural model surpasses the statistical one by a difference of 2 BLEU points for Czech-Polish. In the former case, the language similarity (based on perplexity) is much higher than in the latter case. Additionally, we report negative results for the system combination with back-translation. Our TALP-UPC system submission won 1st place for Czech-to-Polish and 2nd place for Spanish-to-Portuguese in the official evaluation of the 1st WMT Similar Language Translation task.Comment: WMT 2019 Shared Task pape

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Continual lifelong learning in natural language processing: a survey

Author: Biesialska Katarzyna
Biesialska Magdalena Marta
Ruiz Costa-Jussà Marta
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Continual learning (CL) aims to enable information systems to learn from a continuous data stream across time. However, it is difficult for existing deep learning architectures to learn a new task without largely forgetting previously acquired knowledge. Furthermore, CL is particularly challenging for language learning, as natural language is ambiguous: it is discrete, compositional, and its meaning is context-dependent. In this work, we look at the problem of CL through the lens of various NLP tasks. Our survey discusses major challenges in CL and current methods applied in neural network models. We also provide a critical review of the existing CL evaluation methods and datasets in NLP. Finally, we present our outlook on future research directions.This work is supported in part by the Catalan Agencia de Gestión de Ayudas Universitarias y de Investigación (AGAUR) through the FI PhD grant; the Spanish Ministerio de Ciencia e Innovación and by the Agencia Estatal de Investigación through the Ramón y Cajal grant and the project PCIN-2017-079; and by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 947657).Peer ReviewedPostprint (published version

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Refinement of unsupervised cross-lingual word embeddings

Author: Biesialska Magdalena Marta
Ruiz Costa-Jussà Marta
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

Cross-lingual word embeddings aim to bridge the gap between high-resource and low-resource languages by allowing to learn multilingual word representations even without using any direct bilingual signal. The lion's share of the methods are projection-based approaches that map pre-trained embeddings into a shared latent space. These methods are mostly based on the orthogonal transformation, which assumes language vector spaces to be isomorphic. However, this criterion does not necessarily hold, especially for morphologically-rich languages. In this paper, we propose a self-supervised method to refine the alignment of unsupervised bilingual word embeddings. The proposed model moves vectors of words and their corresponding translations closer to each other as well as enforces length- and center-invariance, thus allowing to better align cross-lingual embeddings. The experimental results demonstrate the effectiveness of our approach, as in most cases it outperforms state-of-the-art methods in a bilingual lexicon induction task.We thank anonymous reviewers for their helpful comments. This work is supported in part by the Spanish Ministerio de Economía y Competitividad, the European Regional Development Fund and the Agencia Estatal de Investigación, through the post-doctoral senior grant Ramón y Cajal, the contract TEC2015-69266-P (MINECO/FEDER,EU) and the contract PCIN-2017-079 (AEI/MINECO).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Enhancing word embeddings with knowledge extracted from lexical resources

Author: Biesialska Magdalena Marta
Rafieian Bardia
Ruiz Costa-Jussà Marta
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

In this work, we present an effective method for semantic specialization of word vector representations. To this end, we use traditional word embeddings and apply specialization methods to better capture semantic relations between words. In our approach, we leverage external knowledge from rich lexical resources such as BabelNet. We also show that our proposed post-specialization method based on an adversarial neural network with the Wasserstein distance allows to gain improvements over state-of-the-art methods on two tasks: word similarity and dialog state tracking.This work is supported in part by the Spanish Ministerio de Economía y Competitividad, the European Regional Development Fund through the postdoctoral senior grant Ramon y Cajal and by the Agencia Estatal de Investigacion through the projects EUR2019-103819 and PCIN-2017-079.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Findings of the first shared task on lifelong learning machine translation

Author: Barrault Loïc
Biesialska Magdalena Marta
Bougares Fethi
Galibert Olivier
Ruiz Costa-Jussà Marta
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

A lifelong learning system can adapt to new data without forgetting previously acquired knowledge. In this paper, we introduce the first benchmark for lifelong learning machine translation. For this purpose, we provide training, lifelong and test data sets for two language pairs: English-German and English-French. Additionally, we report the results of our baseline systems, which we make available to the public. The goal of this shared task is to encourage research on the emerging topic of lifelong learning machine translation.This work is is supported in part by the Spanish Ministerio de Ciencia e Innovacion, through the postdoctoral senior grant Ramon y Cajal and by the Agencia Estatal de Investigacion through the projects EUR2019-103819, PCIN-2017-079 and PID2019-107579RB-I00 / AEI / 10.13039/501100011033.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Findings of the 2020 Conference on Machine Translation (WMT20)

Author: Barrault Loïc
Biesialska Magdalena
Bojar Ondřej
Costa-Jussà Marta
Federmann Christian
Graham Yvette
Grundkiewicz Roman
Haddow Barry
Huck Matthias
Joanis Eric
Kocmi Tom
Koehn Philipp
Ljubešić Nikola
Lo Chi-kiu
Monz Christof
Morishita Makoto
Nagata Masaaki
Nakazawa Toshiaki
Pal Santanu
Post Matt
Zampieri Marcos
Publication venue
Publication date: 01/01/2020
Field of study

This paper presents the results of the news translation task and the similar language translation task, both organised alongside the Conference on Machine Translation (WMT) 2020. In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories. The task was also opened up to additional test suites to probe specific aspects of translation. In the similar language translation task, participants built machine translation systems for translating between closely related pairs of languages

Biblio at Institute of Formal and Applied Linguistics